Recently the US congress passed a gun control bill- the most significant firearms legislation in nearly 30 years. The bill imposes tougher checks on the purchase of non-militia firearms. The reforms in the bill includes:
The passage of the new bill came after mass shootings at a supermarket in Buffalo, New York and a primary school shooting in Ulvade, Texas. In October 2017, the Las Vegas shooting claimed 58 lives and left over 500 people injured. In June 2016, there was a shooting at a nightclub in Orlando that claimed 48 lives and left 58 people wounded. These are some of the incidences of mass shooting in the USA. I am interested in digging deeper in to the issue of gun violence in the USA to uncover some potential truths, patterns and trends.
The importance of researching gun violence is it can help answer questions such as:
The CSV file contains data for all recorded gun violence incidents in the US between January 2013 and March 2018, inclusive. The data was downloaded from Gun Violence Archive. The dataset consists of 239677 observations on 29 variables.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv("C:/Users/mfaro/Downloads/DATA_01-2013_03-2018.tar/stage3.csv")
df.shape
df.head(2)
df.tail(2)
# Calculating the percentage of NA values in each column
Attributes = []
Missing_value_percentage=[]
for i in df.columns:
Attributes.append(i)
Missing_value_percentage.append(round((df[i].isna().sum()/len(df))*100,1))
data = {"Attributes": Attributes,
"Percentage_of_missing_values": Missing_value_percentage}
data1 = pd.DataFrame(data)
data1
df2 = df.drop(['location_description','participant_relationship','address','incident_url','source_url','incident_url_fields_missing','n_guns_involved',
'incident_characteristics','notes','participant_age_group','participant_name','sources','state_senate_district','state_house_district','congressional_district'],axis=1)
print(df2.shape)
df2.sample(2)
df2.date.dtype
df2.date.isna().sum()
df2.date.min(),df.date.max()
The column date is of object data type, there are no missing rows of data, and the data was recorded from 1 January 2013 to 31 March 2018.
df2.state.dtype
df2.state.isna().sum()
df2.groupby('state')['state'].agg('count')
matplotlib.rcParams['figure.figsize'] = (8,15)
sns.countplot(data=df2,y='state',palette='crest')
The column state if of object data type. There are no missing values. From the countplot above, top 5 states with the most incidents reports are:
df2['city_or_county'].isna().sum()
len(df2.groupby('city_or_county')['city_or_county'].agg('count'))
The dataset contains information from 12898 cities or counties inthe United States. This attribute has no missing values. Since there are 12898 cities or counties included in the dataset, an analysis of the cities or counties with the most and least incidents of gun related violence is provided later.
df2['n_killed'].isna().sum()
df2['n_killed'] = df2['n_killed'].astype(int)
df2['n_killed'].dtype
df['n_killed'].describe()
df2['n_killed'].value_counts()
There are 239 677 incidents of gun violence in the United States, in 185 835 of the incidence reports no deaths were recorded. 48436 incidence reports recorded 1 death, 4604 incidence reports recorded 2 deaths, 595 incidence reports recorded 3 deaths and 139 incidence reports recorded 4 deaths. Incidence reports which resulted in the death of morethan 10 people are rare. Deaths of 10,11,16,17,27 and 50 people per incident reported have been recorded once.
plt.figure(figsize=(10,7))
sns.distplot(df2,x=df2.n_killed,kde=False)
plt.title("Distribution of deaths from gun violence")
plt.xlabel("Number of deaths")
plt.ylabel("Frequency")
df2['n_injured'].isna().sum()
df2['n_injured'].dtype
df2['n_injured'].describe()
df2['n_injured'].value_counts()
Of the 239677 reported incidents, 142487 resulted in no injuries and 81986 resulted in one injured person per incident. There are other extreme incidents as shown above that resulted in more than 20 people injured in each incident. Their occurance is infrequent, for instance, 1 incident resulted in 53 injuries and another resulted in 25 injuries.
plt.figure(figsize=(10,7))
sns.distplot(df2,x=df2.n_injured,kde=False)
plt.title("Distribution of injuries from gun violence")
plt.xlabel("Number of injuries")
plt.ylabel("Frequency")
def StringToDic(S1):
"""Function to create a dictionary from columns"""
dic1 = {}
list1 = str(S1).split('||')
for i in list1:
try:
index = i.split('::')[0]
value = i.split('::')[1]
dic1[index] = value
except:
pass
return dic1
def CountDfValue(df,col='gun_type_dic'):
"""Function to count instances of variables"""
newDic = {}
for index,row in df.iterrows():
for key,value in row[col].items():
if value not in newDic:
newDic[value] = 1
else:
newDic[value] += 1
return newDic
df2['gun_stolen_dic'] = df2['gun_stolen'].apply(lambda x: StringToDic(x))
df2['gun_stolen_dic'].sample(10)
dicGunstolen = CountDfValue(df2,'gun_stolen_dic')
dicGunstolen
# Creating a donut chart
dff = pd.DataFrame([['Unknown',172525],['Not_stolen',1804],['Stolen',17610]],columns=['Gun_status','Number_of_guns'])
dff
# Plotting a donut chart
plt.figure(figsize=(8,6))
explode = (0,0,0)
plt.style.use('ggplot')
plt.title("The status of guns involved in crime in the US(Stolen/Not-stolen):2013-2018")
plt.pie(x=dff['Number_of_guns'],explode=explode,labels=dff['Gun_status'],autopct = '%.2f%%',shadow=False,startangle=0)
plt.axis('equal')
plt.legend(loc='upper right')
circle = plt.Circle(xy=(0,0),radius = 0.7,facecolor='white')
plt.gca().add_artist(circle)
plt.show()
The status of the majority of guns involved in crime is unknown, that is 90% to be exact. 9% of the guns involved in crime are reported stolen and only 1% of the guns are not stolen.
df2['gun_type_dic'] = df2['gun_type'].apply(lambda x: StringToDic(x))
df2['gun_type_dic'].sample(10)
dicGuntype = CountDfValue(df2,'gun_type_dic')
del dicGuntype['Unknown']
dicGuntype
index = list(range(0,26))
new = pd.DataFrame.from_dict([dicGuntype])
gun_type_df = pd.DataFrame.transpose(new)
gun_type_df.reset_index(level=0,inplace=True)
gun_type_df.rename(columns={'index':'gun_type',0:'incidents'},inplace=True)
plt.figure(figsize=(12,5))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(x='gun_type',y='incidents',data=gun_type_df,color='blue')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Types of guns used to commit crime in the US:2013-2018")
plt.ylabel("Incidence of violence")
The handgun is the most commonly used type of gun to commit crime. Other common types of guns recorded in the data are: 9mm, 223 Rem, Rifle,shortgun and 22LR. The used type of gun used in the USA is the 28 gauge.
# Missing values in the lattitude and longitude column
df2['latitude'].isnull().sum(),df2['longitude'].isnull().sum()
Missing values will be removed when performing multivariate analysis
df2['participant_age']
# Converting the columns in to a dictionary
df2['participant_age_dic'] = df2['participant_age'].apply(lambda x: StringToDic(x))
df2['participant_age_dic']
dicparticipant_age = CountDfValue(df2,'participant_age_dic')
del dicparticipant_age['209']
del dicparticipant_age['311']
new2 = pd.DataFrame.from_dict([dicparticipant_age])
new2 = pd.DataFrame.transpose(new2)
new2.reset_index(level=0,inplace=True)
new2.head()
new2.rename(columns = {'index':'participant_age',0:'Number_of_participants'},inplace=True)
new2.head()
new2['participant_age'] = pd.to_numeric(new2['participant_age'],errors='coerce')
new2['Number_of_participants'] = pd.to_numeric(new2['Number_of_participants'],errors='coerce')
new2['Number_of_participants'].dtype,new2['participant_age'].dtype
new2.shape
new2.head()
import plotly
from plotly.offline import init_notebook_mode, iplot
import chart_studio.plotly as py
import plotly.graph_objs as go
from plotly import tools
trace1 = go.Bar(
x=new2.participant_age,
y=new2.Number_of_participants,
name='Age distribution of participants',
marker=dict(
color='rgb(55, 83, 109)'))
data = [trace1]
layout = go.Layout(
title='Age Distribution of Participants',
xaxis=dict(
tickfont=dict(
size=14,
color='rgb(107, 107, 107)',
),
range=[0,100]
),
yaxis=dict(
title='Count',
titlefont=dict(
size=16,
color='rgb(107, 107, 107)'
),
tickfont=dict(
size=14,
color='rgb(107, 107, 107)'
)
),
legend=dict(
x=0,
y=1.0,
bgcolor='rgba(255, 255, 255, 0)',
bordercolor='rgba(255, 255, 255, 0)'
),
barmode='group',
bargap=0.15,
bargroupgap=0.1
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)
The figure above shows a histogram of the age of participants in gun violence. This includes both suspects and victims. The age group with the highest number of people involved in gun related violence is 16-40 years. The distribution of the participants' age is skewed to the right, the tail on the right is longer.
df2['participant_gender']
From observation, the format of this columns is the same as some of the columns above. We are going to use functions, StringtoDic and CountDfValue, to count the number of males and females invovled in gun-related crime. The figures of each gender includes both victims and suspects.
# Converting the columns in to a dictionary
df2['participant_gender_dic'] = df2['participant_gender'].apply(lambda x: StringToDic(x))
df2['participant_gender_dic']
dicGender = CountDfValue(df2,'participant_gender_dic')
del dicGender['Male, female']
dicGender
dff = pd.DataFrame({'Gender': ['Male','Female'],'Participants': [304102,42376]})
dff
# Plotting a donut chart
plt.figure(figsize=(8,6))
explode = (0,0)
plt.style.use('ggplot')
plt.title("Number of participants by gender involved in gun-related violents: 2013-2018")
plt.pie(x=dff['Participants'],explode=explode,labels=dff['Gender'],autopct = '%.2f%%',shadow=False,startangle=0)
plt.axis('equal')
plt.legend(loc='upper right')
circle = plt.Circle(xy=(0,0),radius = 0.7,facecolor='white')
plt.gca().add_artist(circle)
plt.show()
From the pie chart above, of the total number of participants (suspects and victims), 88% are men and 12% are women.
# Checking the format of the column
df2['participant_status'].head()
From observation, the format of this columns is the same as some of the columns above. We are going to use functions, StringtoDic and CountDfValue, to count the number of males and females invovled in gun-related crime. The figures of each gender includes both victims and suspects.
df2['participant_status_dic'] = df2['participant_status'].apply(lambda x: StringToDic(x))
df2['participant_status_dic'].head(3)
dicStatus = CountDfValue(df2,'participant_status_dic')
dicStatus
As seen from the dictionary above, there are some entries which do not make sense. For example, killed and unharmed, killed and arrested, killed and injured, injured and unharmed, and killed,unharmed and arrested. These entries are removed from the dictionary.
del dicStatus['Killed, Unharmed, Arrested']
del dicStatus['Injured, Unharmed']
del dicStatus['Killed, Injured']
del dicStatus['Killed, Unharmed']
del dicStatus['Killed, Arrested']
del dicStatus['Injured, Unharmed, Arrested']
dicStatus
new3 = pd.DataFrame.from_dict([dicStatus])
new3.head()
new3 = pd.DataFrame.transpose(new3)
new3 = new3.reset_index(level=0)
new3
new3.rename(columns={'index':'Status',0:'Participants'},inplace= True)
new3.head()
plt.figure(figsize=(12,4))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(x='Status',y='Participants',data=new3,color='blue')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Participants' status")
plt.ylabel("Number of participants")
df2['participant_type'].head()
From observation, the format of this columns is the same as some of the columns above. We are going to use functions, StringtoDic and CountDfValue, to count the number of males and females invovled in gun-related crime. The figures of each gender includes both victims and suspects.
df2['participant_type_dic'] = df2['participant_type'].apply(lambda x: StringToDic(x))
df2['participant_type_dic'].head()
participanttype_dic = CountDfValue(df2,'participant_type_dic')
participanttype_dic
dff2 = pd.DataFrame({'Status':['Victim','Suspect'],'Frequency':[189600,195913]})
dff2
# Plotting a donut chart
plt.figure(figsize=(8,6))
explode = (0,0)
plt.style.use('ggplot')
plt.title("Number of victims and suspects involved in gun-related violence: 2013-2018")
plt.pie(x=dff2['Frequency'],explode=explode,labels=dff2['Status'],autopct = '%.2f%%',shadow=False,startangle=0)
plt.axis('equal')
plt.legend(loc='upper right')
circle = plt.Circle(xy=(0,0),radius = 0.7,facecolor='white')
plt.gca().add_artist(circle)
plt.show()
The pie chart above shows the number of victims and participants is balanced. 49% of the participants are victims and 51% are suspects.
df3 = df2.copy()
df3.date.dtype
df3['date'] = pd.to_datetime(df3.date)
df3.date.dtype
df3 = df3.assign(year = df3['date'].map(lambda dates: dates.year))
df3 = df3. assign(month = df3['date'].map(lambda dates:dates.month))
df3 = df3.assign(day = df3['date'].map(lambda dates: dates.weekday()))
df3.sample(3)
y_years = df3.groupby('year')['incident_id'].count().index.values
y_years
x_killed = df3.groupby('year')['n_killed'].sum()
n_killeddf = pd.DataFrame({'year':y_years,'Number_of_people':x_killed})
n_killeddf
plt.figure(figsize=(10,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_people',x='year',data=n_killeddf,color='#1f77b4')
plt.title("Number of people killed inthe US per year because of gun violence:2013-2018")
plt.ylabel("Number of people")
A total of 278 incidents are recorded in the data for the period between January 2013 and December 2013, these 278 incidents resulted in 317 deaths. It is odd that only 278 incidents were reported in 2013 considering the subsequent years incidents recorded are more than 50000.The number of people dying because of gun violence increased by 24% from 2014 to 2017. Please note the data recorded for 2018 is from January to March of 2018. This justifies the significant decrease in deaths shown on the figure above.
x_injured = df3.groupby('year')['n_injured'].sum()
n_injureddf = pd.DataFrame({'year':y_years,'Number_of_people':x_injured})
n_injureddf
plt.figure(figsize=(10,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_people',x='year',data=n_injureddf,color='#1f77b4')
plt.title("Number of people injured inthe US per year because of gun violence:2013-2018")
plt.ylabel("Number of people")
The 278 incidents reported in 2013 resulted in 979 injuries. The number of people injured increased by 34% from 2014 to 2017. The injuries display a similar trend as the number of people killed. There is a significant decrease in the number of injuries in 2018 because incidents recorded are only from January to March 2018.
y_months = df3.groupby('month')['incident_id'].count().index.values
y_months
months = np.array(['x','January','February','March','April','May','June','July','August','September','October','November','December'])
months[y_months]
month_killed = df3.groupby('month')['n_killed'].sum()
df_month_killed = pd.DataFrame({'month':months[y_months],'Number_of_people':month_killed},index=[1,2,3,4,5,6,7,8,9,10,11,12])
df_month_killed
plt.figure(figsize=(10,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_people',x='month',data=df_month_killed,color='#1f77b4')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Total number of people killed inthe US per month because of gun violence:2013-2018")
plt.ylabel("Number of people")
The figure above shows the number of people dying each month is more or less uniform. However, January reorded the highest number of deaths followed by the month of March.
x_inc = df3.groupby('year')['incident_id'].agg('count')
x_inc
inc_df_year = pd.DataFrame({'year':y_years,'Incidents':x_inc})
inc_df_year
plt.figure(figsize=(10,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Incidents',x='year',data=inc_df_year,color='#1f77b4')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Total number of gun violence incidents per year inthe US:2013-2018")
plt.ylabel("Number of Incidents")
2013 recorded the least incidents of gun related violence. The number of incidents reported increased by 18% from 2014 to 2017. There is a significant decrease in incidents recorded in 2018 because the data available is only for 3 months (i.e. from January to March). The incidents reported follow a similar trend as the number of people killed and the number of people injured.
inc_month = df3.groupby('month')['incident_id'].agg('count')
inc_month_df = pd.DataFrame({'month':months[y_months],'Incidents':inc_month})
plt.figure(figsize=(10,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Incidents',x='month',data=inc_month_df,color='#1f77b4')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Total number of gun violence incidents each month inthe US:2013-2018")
plt.ylabel("Number of Incidents")
January recorded the highest number of incidents of gun related violence followed by March. The number of incidents recorded each month is more or less uniform.
index = df3.groupby('day')['incident_id'].count().index.values
Incidents = df3.groupby('day')['incident_id'].agg('count')
days = np.array(['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'])
df1_day = pd.DataFrame({'day':days[index],'Incidents':Incidents},index=[0,1,2,3,4,5,6])
df1_day
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Incidents',x='day',data=df1_day,color='#1f77b4')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Total number of gun violence incidents each day inthe US:2013-2018")
plt.ylabel("Number of Incidents")
The figure above shows the number of incidents of gun violence recorded for each day of the week. Saturday and Sunday recorded the highest number of incidents and the least number of incidents was reported on Thursday.Incidents reported on Saturday and Sunday are 11% and 14% higher than incidents reported on Thursday.
nkilled_d = df3.groupby('day')['n_killed'].sum()
dayk_df = pd.DataFrame({'day':days[index],'Number_of_deaths':nkilled_d},index=[0,1,2,3,4,5,6])
dayk_df
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_deaths',x='day',data=dayk_df,color='#1f77b4')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Total number of gun violence deaths each day inthe US:2013-2018")
plt.ylabel("Number of deaths")
The figure above shows the number of deaths caused by gun related violence for each day of the week. Saturday and Sunday recorded the highest number of deaths and the least number of deaths was recorded on Thursday. The number of deaths recorded on Thursday are 27% higher than deaths recorded on Thursday. The number of deaths recorded on Sunday are 32% more than the number of deaths recorded on Thursday.
injured_day = df3.groupby('day')['n_injured'].sum()
injured_ddf = pd.DataFrame({'day':days[index],'Number_of_people':injured_day},index = [0,1,2,3,4,5,6])
injured_ddf
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_people',x='day',data=injured_ddf,color='#1f77b4')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Total number of gun violence injuries each day inthe US:2013-2018")
plt.ylabel("Number of injuries")
The highest number of injuries caused by gun related violence was recorded on Sunday followed by Saturday. The least number of injuries was recorded on Thursday. The number of injuries recorded on Sunday is 49% more than injuries recorded on Thursday. The number of injuries recorded on Saturday is 42% more than injuries reported on a Thursday.
state_incident = df3.groupby('state')['state'].agg('count')
states = df3.groupby('state')['state'].count().index.values
data = {'State': states,'Number_of_incidents':state_incident}
df_state = pd.DataFrame(data)
top10_incidents = df_state.loc[:,['State','Number_of_incidents']].sort_values(by='Number_of_incidents',ascending=False).head(10)
top10_incidents = top10_incidents.reset_index(drop=True)
top10_incidents
plt.figure(figsize=(8,4))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_incidents',x='State',data=top10_incidents,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Top 10 USA States:2013-2018")
plt.ylabel("Number of incidents")
bottom10_incidents = df_state.loc[:,['State','Number_of_incidents']].sort_values(by='Number_of_incidents',ascending=True).head(10)
bottom10_incidents = bottom10_incidents.reset_index(drop=True)
bottom10_incidents
plt.figure(figsize=(8,4))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_incidents',x='State',data=bottom10_incidents,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Bottom 10 USA States:2013-2018")
plt.ylabel("Number of incidents")
cities = df3.groupby('city_or_county')['city_or_county'].count().index.values
Number_of_incidents = df3.groupby('city_or_county')['city_or_county'].agg('count')
data = {'city': cities,'Number_of_incidents':Number_of_incidents}
df_city = pd.DataFrame(data)
top10_city = df_city.loc[:,['city','Number_of_incidents']].sort_values(by='Number_of_incidents',ascending=False).head(10)
top10_city = top10_city.reset_index(drop=True)
top10_city
plt.figure(figsize=(8,4))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_incidents',x='city',data=top10_city,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Top 10 USA Cities or Counties:2013-2018")
plt.ylabel("Number of incidents")
bottom10_city = df_city.loc[:,['city','Number_of_incidents']].sort_values(by='Number_of_incidents',ascending=True).head(10)
bottom10_city = bottom10_city.reset_index(drop=True)
bottom10_city
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_incidents',x='city',data=bottom10_city,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Bottom 10 USA Cities or Counties:2013-2018")
plt.ylabel("Number of incidents")
state = df3.groupby('state')['n_killed'].count().index.values
Number_of_deaths = df3.groupby('state')['n_killed'].agg('count')
data = {'State':state,'Count':Number_of_deaths}
df_death = pd.DataFrame(data)
top10 = df_death.loc[:,['State','Count']].sort_values(by='Count',ascending=False).head(10)
top10 = top10.reset_index(drop=True)
top10
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Count',x='State',data=top10,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Top 10 USA States with the Highest Number of Deaths Resulting from Gun Violence:2013-2018")
plt.ylabel("Number of deaths")
bottom10 = df_death.loc[:,['State','Count']].sort_values(by='Count',ascending=True).head(10)
bottom10 = bottom10.reset_index(drop=True)
bottom10
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Count',x='State',data=bottom10,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Bottom 10 USA States with the Highest Number of Deaths Resulting from Gun Violence:2013-2018")
plt.ylabel("Number of deaths")
city = df3.groupby('city_or_county')['n_killed'].count().index.values
Number_of_deaths = df3.groupby('city_or_county')['n_killed'].agg('count')
data = {'city':city,'Count':Number_of_deaths}
df_city = pd.DataFrame(data)
# Top 10 cities or counties with highest number of deaths
top10_df = df_city.loc[:,['city','Count']].sort_values(by='Count',ascending=False).head(10)
top10_df = top10_df.reset_index(drop=True)
top10_df
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Count',x='city',data=top10_df,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Top 10 USA Cities with the Highest Number of Deaths Resulting from Gun Violence:2013-2018")
plt.ylabel("Number of deaths")
bottom10_df = df_city.loc[:,['city','Count']].sort_values(by='Count',ascending=True).head(10)
bottom10_df = bottom10_df.reset_index(drop=True)
bottom10_df
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Count',x='city',data=bottom10_df,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Bottom 10 USA Cities with the Highest Number of Deaths Resulting from Gun Violence:2013-2018")
plt.ylabel("Number of deaths")
mappingCol1 = 'participant_type_dic'
def MapRows(df,mappingCol1,mappingCol2):
newDic = {'Victim':[],'Suspect':[]}
for rowName, row in df.iterrows():
for keys,values in row[mappingCol1].items():
if(keys in row[mappingCol2]) and (values == 'Victim'):
newDic['Victim'].append(row[mappingCol2][keys])
elif(keys in row[mappingCol2]) and ('Suspect' in values):
newDic['Suspect'].append(row[mappingCol2][keys])
return newDic
mappingCol2 = 'participant_age_dic'
mappingCol3 = 'participant_gender_dic'
Map_type_age = MapRows(df3,mappingCol1,mappingCol2)
Map_type_gender = MapRows(df3,mappingCol1,mappingCol3)
def countDic(L):
dic={}
for i in L:
if i not in dic:
dic[i] = 1
else:
dic[i] += 1
return dic
vic_age_list = list(countDic(Map_type_age['Victim']).keys())
vic_age_count = list(countDic(Map_type_age['Victim']).values())
data = pd.DataFrame({'Age':vic_age_list,'Count':vic_age_count})
data['Age'] = pd.to_numeric(data['Age'],errors='coerce')
data['Age'].dtype
data = data.loc[:,['Age','Count']].sort_values(by='Age',ascending=True)
trace1 = go.Bar(
x = data.Age,
y = data.Count,
name = 'Victims Age Distribution',
marker = dict(
color = 'rgb(55,83,109)'))
data = [trace1]
layout = go.Layout(
title="Victims' Age Distribution",
xaxis=dict(
tickfont=dict(
size=14,
color='rgb(107, 107, 107)',
),
range=[0,100]
),
yaxis=dict(
title='Count',
titlefont=dict(
size=16,
color='rgb(107, 107, 107)'
),
tickfont=dict(
size=14,
color='rgb(107, 107, 107)'
)
),
legend=dict(
x=0,
y=1.0,
bgcolor='rgba(255, 255, 255, 0)',
bordercolor='rgba(255, 255, 255, 0)'
),
barmode='group',
bargap=0.15,
bargroupgap=0.1
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)
The Victims' ages distribution is skewed to the right. The distribution's left tail is long. The most affected age groups are from 14 to 45 years, with at least 1000 victims recorded for each age group. The highest number of victims were aged 19 years. As observed from the histogram there are also victims below 14 years. There are 89 victims who are less 1 year old recorded in the data. 210 victims who are 1 year old are also recorded in the data.
sus_age_list = list(countDic(Map_type_age['Suspect']).keys())
sus_age_count = list(countDic(Map_type_age['Suspect']).values())
data2 = pd.DataFrame({'Age':sus_age_list,'Count':sus_age_count})
data2['Age'] = pd.to_numeric(data2['Age'],errors='coerce')
data2 = data2.sort_values(by='Age',ascending=True)
trace1 = go.Bar(
x = data2.Age,
y = data2.Count,
name = 'Suspects Age Distribution',
marker = dict(
color = 'rgb(55,83,109)'))
data = [trace1]
layout = go.Layout(
title="Suspects' Age Distribution",
xaxis=dict(
tickfont=dict(
size=14,
color='rgb(107, 107, 107)',
),
range=[0,100]
),
yaxis=dict(
title='Count',
titlefont=dict(
size=16,
color='rgb(107, 107, 107)'
),
tickfont=dict(
size=14,
color='rgb(107, 107, 107)'
)
),
legend=dict(
x=0,
y=1.0,
bgcolor='rgba(255, 255, 255, 0)',
bordercolor='rgba(255, 255, 255, 0)'
),
barmode='group',
bargap=0.15,
bargroupgap=0.1
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)
The suspects' ages distribution is skewed to the right. The distribution's left tail is long. The majority of suspects fall under age groups 15 to 45 years. The highest number of suspects are aged 18 and 19 years old. There are also records of suspects who are less than 4 years old. These records are errors perhaps. There are also 36 suspects aged 5, 41 suspects aged 6, 31 suspects aged 7, 37 suspects aged 8, 38 suspects aged 9 and 48 suspects aged 10. The data shows records of minors having access to guns.
vic_gender_list = list(countDic(Map_type_gender['Victim']).keys())
vic_gender_count = list(countDic(Map_type_gender['Victim']).values())
victims_gender = pd.DataFrame({'Gender':vic_gender_list,'Count':vic_gender_count})
victims_gender[victims_gender['Gender']=='Male, female'] = np.nan
victims_gender.dropna(inplace=True)
victims_gender
# Plotting a donut chart
plt.figure(figsize=(8,6))
explode = (0,0)
plt.style.use('ggplot')
plt.title("Number of Victims by gender involved in gun-related violents: 2013-2018")
plt.pie(x=victims_gender['Count'],explode=explode,labels=victims_gender['Gender'],autopct = '%.2f%%',shadow=False,startangle=0)
plt.axis('equal')
plt.legend(loc='upper right')
circle = plt.Circle(xy=(0,0),radius = 0.7,facecolor='white')
plt.gca().add_artist(circle)
plt.show()
82% of the victims are male and 18% of the victims are female
sus_gender_list = list(countDic(Map_type_gender['Suspect']).keys())
sus_gender_count = list(countDic(Map_type_gender['Suspect']).values())
suspect_gender = pd.DataFrame({'Gender':sus_gender_list,'Count':sus_gender_count})
suspect_gender
# Plotting a donut chart
plt.figure(figsize=(8,6))
explode = (0,0)
plt.style.use('ggplot')
plt.title("Number of Suspects by gender involved in gun-related violents: 2013-2018")
plt.pie(x=suspect_gender['Count'],explode=explode,labels=suspect_gender['Gender'],autopct = '%.2f%%',shadow=False,startangle=0)
plt.axis('equal')
plt.legend(loc='upper right')
circle = plt.Circle(xy=(0,0),radius = 0.7,facecolor='white')
plt.gca().add_artist(circle)
plt.show()
93% of the suspects are male and 7% are female.
The data used in the analysis contains records of gun violence incidents recorded from January 2013 to March 2018. The incidents compiled in the data are from all states in the United States of America. There 239677 incidents of gun violence recorded. 185 835 of these incidents recorded 0 deaths and the rest of the incidents recorded resulted in at least 1 death. 97 190 of these incidents recorded that at least 1 person was injured. It is concerning to note the status of 90% of the guns involved in gun violence is unknown. 9.1% of the guns are confirmed stolen and 1% of the guns are confirmed not stolen. The handgun is the most commonly used type of weapon to commit crime. Other types of guns also recorded are: 9mm, 223 Rem, shortgun and 22LR, which are among the most commonly used guns in the US.
The data analysed has 278 incidents recorded in 2013. This maybe a result of an error ommission. The number of incidents recorded increased by 18.4% from 2014 to 2017. The number of deaths caused by gun violence increased by 24% from 2014 to 2017. The number of injuries assosciated with gun violence increased by 34% from 2014 to 2017. From the analysis, it was observed gun violence incidents are the lowest on Thursdays and are increased by as much as 14% on Sunday. The highest number of incidents is recorded on Saturdays and Sundays. The highest number of injuries or deaths have also been recorded on Saturday and Sundays.
The majority of suspects fall under age groups between 15 to 45 years. The mode age of the suspects is 19 and the second highest age is 18. There are also records of 36 suspects aged 5, 41 suspects aged 6, 31 suspects aged 7, 37 suspects aged 8, 38 suspects aged 9 and 48 suspects aged 10. The gun control bill recently passed by congress imposes tougher background checks for buyers below the age of 21. This addresses one of the issues since the mode age groups of the suspects is 18 and 19. However, I think access to guns by minors needs to be investigated. 93% of the suspects are male and 7% of the suspects are female.
The states which recorded the highest and lowest number of incidents:
top10_incidents,bottom10_incidents
Cities which recorded the highest and lowest number of incidents:
top10_city,bottom10_city
States which recorded the highest and lowest number of deaths:
top10,bottom10
Cities which recorded the highest and lowest number of deaths:
top10_df,bottom10_df
Now looking at the top 3 states with the highest incidents of gun violence which are: Illinois, California and Florida. In 2013, Illinois adopted the Firearm Conceal Carry Act allowing individuals to obtain a licence to carry concealed handguns in public. A licence is not required to carry a concealed handgun on a person's property, including his or her home or place of business. Nor is a licence required to carry a concealed handgun on the land or in the home of another person, as long as it is within that person's permission. To purchase a firearm, one should be at least 21 years old, if not parental consent is needed. Additionally, one should be in posession of a Firearm Owner's Identification, but persons with a Conceal Carry licence can purchase a handgun without the Firearm Owner's Identification.
In California, a US citizen or legal resident at least 18 years old may carry a handgun anywhere within his or her place of residence, place of business or private property. A permit or licence is not required. Concealed carry is legal with a California Conceal Carry weapons licence. The minimum age allowed is 18. California state does not require a permit to purchase firearms.
In Florida, people are not required to possess a permit to own or purchase a handgun. They are required to carry a Conceal weapon permit to carry a concealed weapon. But also persons are permitted to carry concealed firearms without a permit under these circumstances:
The minimum age required to purchase, own or carry a concealed weapon is 21.
Now looking at the three states with the lowest gun violence incidents which are: Hawaii, Vermont and Wyoming. Hawaii requires a permit to acquire a handgun, persons acquiring the handgun must be 21 years old and above. A licence is also required to carry a firearm in public. Wyoming and Hawaii have similar gun control laws.
In Vermont open carry is legal and no licence is required to open carry. Vermont does not require a licence to carry a concealed firearm. Vermont does not prohibit possession of machine guns. To purchase a firearm a permit is not a requirement. The legal age limit to purchase, own or carry a gun is 16 years old.
Gun control laws may not influence the number of incidents of gun violence occurring, other factors may be at play. Vermont has the lenient and less strict gun control laws but it is one of the states with the lowest incidents of gun violence. Gun control laws in Illinois are much more stringent than in Vermont.